knitr::opts_chunk$set(error = TRUE)
We will need ‘vegan’ package to test our function.
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-2
Write a function that computes your favorite measures of dis/similarity (at least 2 should be included). Construct function that measures a pairwise similarity/dissimilarity for two vectors (i.e., function(x,y)) Try to build in some warnings and error messages so you (and your friends) are warned when something goes wrong.
Now create some test data to make sure that your function does what you want it to do.
Do your warnings and error messages work?
# Does your function tell users when they tried to use data with missing values?
sample1 <- c(0, 3, NA)
sample2 <- c(9, 4, 12)
mydist(sample1,sample2)
## Error in mydist(sample1, sample2): NO! NO!! NO!!! missing values are not allowed
# Does it warn users who tried to use two vectors of different length?
sample1 <- c(0, 3, 4, 56)
sample2 <- c(9, 4, 12)
mydist(sample1,sample2)
## Error in mydist(sample1, sample2): What are you doing??!! You cannot use vectors of unequal length! Gee!
# Does it warn users that in case of binary (presence-absence) data certain estimates cannot be computed?
sample1 <- c(0, 1, 1)
sample2 <- c(1, 1, 0)
mydist(sample1,sample2)
## Warning in mydist(sample1, sample2): binary data: missing values generated
## for measures that require abundance data
## parameter
## 1. Shared taxa 1.0000000
## 2. Present in sample 1 only 1.0000000
## 3. Present in sample 2 only 1.0000000
## 4. Shared absences 0.0000000
## 5. Total number of species present 3.0000000
## 6. Total number of specimens NA
## 7. Total number of specimens in sample 1 NA
## 8. Total number of specimens in sample 2 NA
## 9. Total number of occurrences 4.0000000
## 10. total number of occurrences in sample 1 2.0000000
## 11. Total number of occurrences in sample 2 2.0000000
## 12. Simple Matching Coefficient 0.3333333
## 13. Jaccard Similarity 0.3333333
## 14. Sorenson Similarity 0.5000000
## 15. Jaccard Dissimilarity 0.6666667
## 16. Sorenson Dissimilarity 0.5000000
## 17. Forbes-Alroy Similarity 0.7593088
## 18. Percentage Similarity NA
## 19. Bray Curtis Dissimilarity NA
## 20. Jaccard-Chao Similarity NA
## 21. Jaccard-Chao Similarity Adj NA
## 22. Sorenson-Chao Similarity NA
## 23. Sorenson-Chao Similarity Adj NA
## 24. Jaccard-Chao Dissimilarity NA
## 25. Jaccard-Chao Dissimilarity Adj NA
## 26. Sorenson-Chao Dissimilarity NA
## 27. Sorenson-Chao Dissimilarity Adj NA
Obviously, if you are professional about your R functions, you would never write error messages that unnecessarily insult the users.
Now test it against well-established functions such as {vegdist}.
# most important now: does it compute parameters correctly?
sample1 <- c(0,15,3,42,0,0,1,12)
sample2 <- c(7,11,0,0,0,32,78,6)
mydist(sample1,sample2)
## parameter
## 1. Shared taxa 3.0000000
## 2. Present in sample 1 only 2.0000000
## 3. Present in sample 2 only 2.0000000
## 4. Shared absences 1.0000000
## 5. Total number of species present 7.0000000
## 6. Total number of specimens 207.0000000
## 7. Total number of specimens in sample 1 73.0000000
## 8. Total number of specimens in sample 2 134.0000000
## 9. Total number of occurrences 10.0000000
## 10. total number of occurrences in sample 1 5.0000000
## 11. Total number of occurrences in sample 2 5.0000000
## 12. Simple Matching Coefficient 0.5000000
## 13. Jaccard Similarity 0.4285714
## 14. Sorenson Similarity 0.6000000
## 15. Jaccard Dissimilarity 0.5714286
## 16. Sorenson Dissimilarity 0.4000000
## 17. Forbes-Alroy Similarity 0.8282635
## 18. Percentage Similarity 0.1739130
## 19. Bray Curtis Dissimilarity 0.8260870
## 20. Jaccard-Chao Similarity 0.3313816
## 21. Jaccard-Chao Similarity Adj 0.3829736
## 22. Sorenson-Chao Similarity 0.4978011
## 23. Sorenson-Chao Similarity Adj 0.5538408
## 24. Jaccard-Chao Dissimilarity 0.6686184
## 25. Jaccard-Chao Dissimilarity Adj 0.6170264
## 26. Sorenson-Chao Dissimilarity 0.5021989
## 27. Sorenson-Chao Dissimilarity Adj 0.4461592
vegdist(rbind(sample1,sample2), 'chao')
## sample1
## sample2 0.6170264
vegdist(rbind(sample1,sample2), 'bray')
## sample1
## sample2 0.826087
vegdist(rbind(sample1,sample2), 'jaccard', binary=T)
## sample1
## sample2 0.5714286
Cite packages
citation('vegan')
##
## To cite package 'vegan' in publications use:
##
## Jari Oksanen, F. Guillaume Blanchet, Michael Friendly, Roeland
## Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, R. B.
## O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens,
## Eduard Szoecs and Helene Wagner (2018). vegan: Community Ecology
## Package. R package version 2.5-2.
## https://CRAN.R-project.org/package=vegan
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {vegan: Community Ecology Package},
## author = {Jari Oksanen and F. Guillaume Blanchet and Michael Friendly and Roeland Kindt and Pierre Legendre and Dan McGlinn and Peter R. Minchin and R. B. O'Hara and Gavin L. Simpson and Peter Solymos and M. Henry H. Stevens and Eduard Szoecs and Helene Wagner},
## year = {2018},
## note = {R package version 2.5-2},
## url = {https://CRAN.R-project.org/package=vegan},
## }
##
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.